Search CORE

748 research outputs found

A region-centered topic model for object discovery and category-based image segmentation

Author: Díaz de María Fernando
González Díaz Iván
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Latent topic models have become a popular paradigm in many computer vision applications due to their capability to unsupervisely discover semantics in visual content. Relying on the Bag-of-Words representation, they consider images as mixtures of latent topics that generate visual words according to some specific distributions. However, the performance of these methods is still limited by the way in which they take into account the spatial distribution of visual words and, what is even more important, the currently used appearance distributions. In this paper, we propose a novel region-centered latent topic model that introduces two main contributions: first, an improved spatial context model that allows for considering inter-topic inter-region influences; and second, an advanced region-based appearance distribution built on the Kernel Logistic Regressor. It is worth highlighting that the proposed contributions have been seamlessly integrated in the model, so that all the parameters are concurrently estimated using a unified inference process. Furthermore, the proposed model has been extended to work in both unsupervised and supervised modes. Our results for unsupervised mode improve 30% those of previous latent topic models. For supervised mode, where discriminative approaches are preponderant, our results are quite close to those of discriminative state-of-the-art methods.This work has been partially supported by the project AFICUS, co-funded by the Spanish Ministry of Industry, Trade and Tourism, and the European Fund for Regional Development, with Ref.: TSI-020110-2009-103, and the National Grant TEC2011-26807 of the Spanish Ministry of Science and Innovation.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Adaptive Multi-Pattern Fast Block-Matching Algorithm Based on Motion Classification Techniques

Author: Díaz de María Fernando
Frutos-López Manuel de
González Díaz Iván
Sanz Rodríguez-Escalona Sergio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Motion estimation is the most time-consuming subsystem in a video codec. Thus, more efficient methods of motion estimation should be investigated. Real video sequences usually exhibit a wide-range of motion content as well as different degrees of detail, which become particularly difficult to manage by typical block-matching algorithms. Recent developments in the area of motion estimation have focused on the adaptation to video contents. Adaptive thresholds and multi-pattern search algorithms have shown to achieve good performance when they success to adjust to motion characteristics. This paper proposes an adaptive algorithm, called MCS, that makes use of an especially tailored classifier that detects some motion cues and chooses the search pattern that best fits to them. Specifically, a hierarchical structure of binary linear classifiers is proposed. Our experimental results show that MCS notably reduces the computational cost with respect to an state-of-the-art method while maintaining the qualityPublicad

DepositOnce

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Mid-level feature set for specific event and anomaly detection in crowded scenes

Author: Calle Silos Fernando de la
Díaz de María Fernando
González Díaz Iván
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Proceedings of: 20th IEEE International Conference on Image Processing (ICIP 2013). Melbourne, Australia, September 15-18, 2013.In this paper we propose a system for automatic detection of specific events and abnormal behaviors in crowded scenes. In particular, we focus on the parametrization by proposing a set of mid-level spatio-temporal features that successfully model the characteristic motion of typical events in crowd behaviors. Furthermore, due to the fact that some features are more suitable than others to model specific events of interest, we also present an automatic process for feature selection. Our experiments prove that the suggested feature set works successfully for both explicit event detection and distance-based anomaly detection tasks. The results on PETS for explicit event detection are generally better than those previously reported. Regarding anomaly detection, the proposed method performance is comparable to those of state-of-the-art method for PETS and substantially better than that reported for Web dataset.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web

Author: Díaz de María Fernando
Gallardo Antolín Ascensión
Peláez Moreno Carmen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Uncertainty decoding on Frequency Filtered parameters for robust ASR

Author: Díaz de María Fernando
Vicente Peña Jesús de
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

The use of feature enhancement techniques to obtain estimates of the clean parameters is a common approach for robust automatic speech recognition (ASR). However, the decoding algorithm typically ignores how accurate these estimates are. Uncertainty decoding methods incorporate this type of information. In this paper, we develop a formulation of the uncertainty decoding paradigm for Frequency Filtered (FF) parameters using spectral subtraction as a feature enhancement method. Additionally, we show that the uncertainty decoding method for FF parameters admits a simple interpretation as a spectral weighting method that assigns more importance to the most reliable spectral components. Furthermore, we suggest combining this method with SSBD-HMM (Spectral Subtraction and Bounded Distance HMM), one recently proposed technique that is able to compensate for the effects of features that are highly contaminated (outliers). This combination pursues two objectives: to improve the results achieved by uncertainty decoding methods and to determine which part of the improvements is due to compensating for the effects of outliers and which part is due to compensating for other less deteriorated features.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

Author: Díaz de María Fernando
Gallardo Antolín Ascensión
Peláez Moreno Carmen
Vicente Peña Jesús de
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad Carlos III de Madrid e-Archivo

In-layer multi-buffer framework for rate-controlled scalable video coding

Author: Díaz de María Fernando
Sanz Rodríguez-Escalona Sergio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Temporal scalability is supported in scalable video coding (SVC) by means of hierarchical prediction structures, where the higher layers can be ignored for frame rate reduction. Nevertheless, this kind of scalability is not totally exploited by the rate control (RC) algorithms since the hypothetical reference decoder (HRD) requirement is only satisfied for the highest frame rate sub-stream of every dependency (spatial or coarse grain scalability) layer. In this paper we propose a novel RC approach that aims to deliver several HRD-compliant temporal resolutions within a particular dependency layer. Instead of using the common SVC encoder configuration consisting of a dependency layer per each temporal resolution, a compact configuration that does not require additional dependency layers for providing different HRD-compliant temporal resolutions is proposed. Specifically, the proposed framework for rate-controlled SVC uses a set of virtual buffers within a dependency layer so that their levels can be simultaneously controlled for overflow and underflow prevention while minimizing the reconstructed video distortion of the corresponding sub-streams. This in-layer multi-buffer approach has been built on top of a baseline H.264/SVC RC algorithm for variable bit rate applications. The experimental results show that our proposal achieves a good performance in terms of mean quality, quality consistency, and buffer control using a reduced number of layers.This work has been partially supported by the National Grant TEC2011-26807 of the Spanish Ministry of Science and Innovation.Publicad

DepositOnce

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

A Generative Model for Concurrent Image Retrieval and ROI Segmentation

Author: Baz-Hormigos Carlos E.
Díaz de María Fernando
González Díaz Iván
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This paper proposes a probabilistic generative model that concurrently tackles the problems of image retrieval and region-of-interest (ROI) segmentation. Specifically, the proposed model takes into account several properties of the matching process between two objects in different images, namely: objects undergoing a geometric transformation, typical spatial location of the region of interest, and visual similarity. In this manner, our approach improves the reliability of detected true matches between any pair of images. Furthermore, by taking advantage of the links to the ROI provided by the true matches, the proposed method is able to perform a suitable ROI segmentation. Finally, the proposed method is able to work when there is more than one ROI in the query image. Our experiments on two challenging image retrieval datasets proved that our approach clearly outperforms the most prevalent approach for geometrically constrained matching and compares favorably to most of the state-of-the-art methods. Furthermore, the proposed technique concurrently provided very good segmentations of the ROI. Furthermore, the capability of the proposed method to take into account several objects-of-interest was also tested on three experiments: two of them concerning image segmentation and object detection in multi-object image retrieval tasks, and another concerning multiview image retrieval. These experiments proved the ability of our approach to handle scenarios in which more than one object of interest is present in the query.This work has been partially supported by the project AFICUS, co-funded by the Spanish Ministry of Industry, Trade and Tourism, and the European Fund for Regional Development, with Ref.: TSI-020110-2009-103, and the National Grant TEC2011-26807 of the Spanish Ministry of Science and Innovation.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Rate Control Initialization Algorithm for Scalable Video Coding

Author: Díaz de María Fernando
Sanz Rodríguez-Escalona Sergio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Proceeding of: 18th IEEE International Conference on Image Processing (ICIP), 2011.In this paper we propose a novel rate control initialization algorithm for real-time H.264/scalable video coding. In particular, a two-step approach is proposed. First, the initial quantization parameter (QP) for each layer is determined by means of a parametric rate-quantization (R-Q) modeling that depends on the layer identifier (base or enhancement) and on the type of scalability (spatial or quality). Second, an intra-frame QP refinement method that allows for adapting the initial QP value when needed is carried out over the three first coded frames in order to take into consideration both the buffer control and the spatio-temporal complexity of the scene. The experimental results show that the proposed R-Q modeling for initial QP estimation, in combination with the intra-frame QP refinement method, provide a good performance in terms of visual quality and buffer control, achieving remarkably similar results to those achieved by using ideal initial QP values.The Spanish National grant TSI-020110-2009-103 (AFICUS) and the Regional grant CCG10-UC3M/TIC-5570 (AMASSACA).Publicad

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo